Goto

Collaborating Authors

 rl approach




Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Neural Information Processing Systems

Training offline RL models using visual inputs poses two significant challenges,, the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the " " for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces. Experimental results demonstrate the effectiveness of CoWorld, outperforming existing RL approaches by large margins.


Generating Auxiliary Tasks with Reinforcement Learning

Goldfeder, Judah, So, Matthew, Lipson, Hod

arXiv.org Artificial Intelligence

Auxiliary Learning (AL) is a form of multi-task learning in which a model trains on auxiliary tasks to boost performance on a primary objective. While AL has improved generalization across domains such as navigation, image classification, and NLP, it often depends on human-labeled auxiliary tasks that are costly to design and require domain expertise. Meta-learning approaches mitigate this by learning to generate auxiliary tasks, but typically rely on gradient based bi-level optimization, adding substantial computational and implementation overhead. We propose RL-AUX, a reinforcement-learning (RL) framework that dynamically creates auxiliary tasks by assigning auxiliary labels to each training example, rewarding the agent whenever its selections improve the performance on the primary task. We also explore learning per-example weights for the auxiliary loss. On CIFAR-100 grouped into 20 superclasses, our RL method outperforms human-labeled auxiliary tasks and matches the performance of a prominent bi-level optimization baseline. We present similarly strong results on other classification datasets. These results suggest RL is a viable path to generating effective auxiliary tasks.


DARIL: When Imitation Learning outperforms Reinforcement Learning in Surgical Action Planning

Boels, Maxence, Robertshaw, Harry, Booth, Thomas C, Dasgupta, Prokar, Granados, Alejandro, Ourselin, Sebastien

arXiv.org Artificial Intelligence

Surgical action planning requires predicting future instrument-verb-target triplets for real-time assistance. While teleoperated robotic surgery provides natural expert demonstrations for imitation learning (IL), reinforcement learning (RL) could potentially discover superior strategies through self-exploration. We present the first comprehensive comparison of IL versus RL for surgical action planning on CholecT50. Our Dual-task Autoregressive Imitation Learning (DARIL) baseline achieves 34.6% action triplet recognition mAP and 33.6% next frame prediction mAP with smooth planning degradation to 29.2% at 10-second horizons. We evaluated three RL variants: world model-based RL, direct video RL, and inverse RL enhancement. Surprisingly, all RL approaches underperformed DARIL--world model RL dropped to 3.1% mAP at 10s while direct video RL achieved only 15.9%. Our analysis reveals that distribution matching on expert-annotated test sets systematically favors IL over potentially valid RL policies that differ from training demonstrations. This challenges assumptions about RL superiority in sequential decision making and provides crucial insights for surgical AI development.





Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning

Neural Information Processing Systems

Training offline RL models using visual inputs poses two significant challenges, i.e., the overfitting problem in representation learning and the overestimation bias for expected future rewards. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors. This paper, in contrast, tries to build more flexible constraints for value estimation without impeding the exploration of potential advantages. The key idea is to leverage off-the-shelf RL simulators, which can be easily interacted with in an online manner, as the "test bed" for offline policies. To enable effective online-to-offline knowledge transfer, we introduce CoWorld, a model-based RL approach that mitigates cross-domain discrepancies in state and reward spaces.


Review for NeurIPS paper: Multi-agent active perception with prediction rewards

Neural Information Processing Systems

Weaknesses: The paper is well written and easy to follow. The problem is active perception is also interesting. There are a few areas where more clarification is needed as pointed below: -- The authors have highlighted a number of previous models for the problem of active perception such as Dec-\rhoPOMDP, POMDP-IR etc. Given the focus on converting this problem to a decentralized framework, it is not clearly conveyed why decentralizing the problem is significant? There are hints available in the paper such as less communication overhead, but there is no empirical evidence presented towards justifying decentralized approaches over such previous approaches (e.g., how much communication overhead is reduced) -- The technical approach presented by the authors is elegant and simple, but it is essentially a heuristic approach. The bound provided in theorem 1 would seem to be loose in the worst case (and its values in experiments is not shown).